NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Park, Jongho; Park, Jaeseung; Xiong, Zheyang; Lee, Nayoung; Cho, Jaewoong; Oymak, Samet; Lee, Kangwook; Papailiopoulos, Dimitris (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Park, Jongho; Park, Jaeseung; Xiong, Zheyang; Lee, Nayoung; Cho, Jaewoong; Oymak, Samet; Lee, Kangwook; Papailiopoulos, Dimitris (July 2024, Proceedings of the 41st International Conference on Machine Learning)

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain less explored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models.
more » « less
Full Text Available
Teaching Arithmetic to Small Transformers

Lee, Nayoung; Sreenivasan, Kartik; Lee, Jason D; Lee, Kangwook; Papailiopoulos, DImitris (May 2024, International Conference on Learning Representations)

Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how even small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and parameter scaling. Additionally, we discuss the challenges associated with length generalization. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction loss for rapidly eliciting arithmetic capabilities.
more » « less
A florigen-expressing subpopulation of companion cells expresses other small proteins and reveals a nitrogen-sensitive FT repressor

https://doi.org/10.1101/2024.08.17.608367

Takagi, Hiroshi; Ito, Shogo; Shim, Jae Sung; Kubota, Akane; Hempton, Andrew K; Lee, Nayoung; Suzuki, Takamasa; Yang, Chansie; Nolan, Christine T; Bubb, Kerry L; et al (August 2024, bioRxiv)

Abstract The precise onset of flowering is crucial to ensure successful plant reproduction. The geneFLOWERING LOCUS T(FT) encodes florigen, a mobile signal produced in leaves that initiates flowering at the shoot apical meristem. In response to seasonal changes,FTis induced in phloem companion cells located in distal leaf regions. Thus far, a detailed molecular characterization of theFT-expressing cells has been lacking. Here, we used bulk nuclei RNA-seq and single nuclei RNA (snRNA)-seq to investigate gene expression inFT-expressing cells and other phloem companion cells. Our bulk nuclei RNA-seq demonstrated thatFT-expressing cells in cotyledons and in true leaves differed transcriptionally. Within the true leaves, our snRNA-seq analysis revealed that companion cells with highFTexpression form a unique cluster in which many genes involved in ATP biosynthesis are highly upregulated. The cluster also expresses other genes encoding small proteins, including the flowering and stem growth inducer FPF1-LIKE PROTEIN 1 (FLP1) and the anti-florigen BROTHER OF FT AND TFL1 (BFT). In addition, we found that the promoters ofFTand the genes co-expressed withFTin the cluster were enriched for the consensus binding motifs of NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1). Overexpression of the paralogousNIGT1.2andNIGT1.4repressedFTexpression and significantly delayed flowering under nitrogen-rich conditions, consistent with NIGT1s acting as nitrogen-dependentFTrepressors. Taken together, our results demonstrate that majorFT-expressing cells show a distinct expression profile that suggests that these cells may produce multiple systemic signals to regulate plant growth and development.
more » « less
Full Text Available
Florigen-producing cells express FPF1-LIKE PROTEIN 1 that accelerates flowering and stem growth in long days with sunlight red/far-red ratio in Arabidopsis

https://doi.org/10.1101/2024.04.26.591289

Takagi, Hiroshi; Lee, Nayoung; Hempton, Andrew K; Purushwani, Savita; Notaguchi, Michitaka; Yamauchi, Kota; Shirai, Kazumasa; Kawakatsu, Yaichi; Uehara, Susumu; Albers, William G; et al (April 2024, bioRxiv)

Summary Seasonal changes in spring induce flowering by expressing the florigen, FLOWERING LOCUS T (FT), inArabidopsis.FTis expressed in unique phloem companion cells with unknown characteristics. The question of which genes are co-expressed withFTand whether they have roles in flowering remains elusive. Through tissue-specific translatome analysis, we discovered that under long-day conditions with the natural sunlight red/far-red ratio, theFT-producing cells express a gene encoding FPF1-LIKE PROTEIN 1 (FLP1). The masterFTregulator, CONSTANS (CO), controlsFLP1expression, suggestingFLP1’s involvement in the photoperiod pathway. FLP1 promotes early flowering independently ofFT,is active in the shoot apical meristem, and induces the expression ofSEPALLATA 3(SEP3), a key E-class homeotic gene. Unlike FT, FLP1 facilitates inflorescence stem elongation. Our cumulative evidence indicates that FLP1 may act as a mobile signal. Thus, FLP1 orchestrates floral initiation together with FT and promotes inflorescence stem elongation during reproductive transitions.
more » « less
Full Text Available
Molecular basis of flowering under natural long-day conditions in Arabidopsis

https://doi.org/10.1038/s41477-018-0253-3

Song, Young Hun; Kubota, Akane; Kwon, Michael S.; Covington, Michael F.; Lee, Nayoung; Taagen, Ella R.; Laboy Cintrón, Dianne; Hwang, Dae Yeon; Akiyama, Reiko; Hodge, Sarah K.; et al (October 2018, Nature Plants)

Full Text Available

Search for: All records